Overview

Dataset statistics

Number of variables26
Number of observations14602
Missing cells658
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.9 MiB
Average record size in memory208.0 B

Variable types

Numeric9
Categorical17

Alerts

Name has a high cardinality: 13485 distinct valuesHigh cardinality
Address has a high cardinality: 5431 distinct valuesHigh cardinality
StreetName has a high cardinality: 572 distinct valuesHigh cardinality
UnitNo has a high cardinality: 1721 distinct valuesHigh cardinality
PostalCode has a high cardinality: 2497 distinct valuesHigh cardinality
Location has a high cardinality: 56 distinct valuesHigh cardinality
NAICSTitle has a high cardinality: 681 distinct valuesHigh cardinality
Phone has a high cardinality: 14000 distinct valuesHigh cardinality
Fax has a high cardinality: 9400 distinct valuesHigh cardinality
TollFree has a high cardinality: 1952 distinct valuesHigh cardinality
EMail has a high cardinality: 8264 distinct valuesHigh cardinality
WebAddress has a high cardinality: 8810 distinct valuesHigh cardinality
EmplUpdate has a high cardinality: 299 distinct valuesHigh cardinality
X is highly overall correlated with Y and 5 other fieldsHigh correlation
Y is highly overall correlated with X and 5 other fieldsHigh correlation
FID is highly overall correlated with BusinessID and 1 other fieldsHigh correlation
BusinessID is highly overall correlated with FID and 1 other fieldsHigh correlation
StreetNo is highly overall correlated with X and 5 other fieldsHigh correlation
Ward is highly overall correlated with X and 5 other fieldsHigh correlation
NAICSCode is highly overall correlated with NAICSDescrHigh correlation
CENT_X is highly overall correlated with X and 5 other fieldsHigh correlation
CENT_Y is highly overall correlated with X and 5 other fieldsHigh correlation
Location is highly overall correlated with X and 9 other fieldsHigh correlation
NAICSDescr is highly overall correlated with Location and 2 other fieldsHigh correlation
BldgNo is highly overall correlated with LocationHigh correlation
Sector_Des is highly overall correlated with NAICSDescrHigh correlation
EmplRange has 567 (3.9%) missing valuesMissing
FID is uniformly distributedUniform
Name is uniformly distributedUniform
FID has unique valuesUnique
BusinessID has unique valuesUnique

Reproduction

Analysis started2023-01-31 22:33:12.238039
Analysis finished2023-01-31 22:33:34.163269
Duration21.93 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

X
Real number (ℝ)

Distinct4283
Distinct (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-79.65342
Minimum-79.80298
Maximum-79.550935
Zeros0
Zeros (%)0.0%
Negative14602
Negative (%)100.0%
Memory size114.2 KiB

Quantile statistics

Minimum-79.80298
5-th percentile-79.743132
Q1-79.680557
median-79.650434
Q3-79.619498
95-th percentile-79.577579
Maximum-79.550935
Range0.25204547
Interquartile range (IQR)0.061058994

Descriptive statistics

Standard deviation0.047759191
Coefficient of variation (CV)-0.00059958745
Kurtosis-0.06577376
Mean-79.65342
Median Absolute Deviation (MAD)0.030394402
Skewness-0.43242004
Sum-1163099.2
Variance0.0022809403
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-79.64275968 185
 
1.3%
-79.60364656 123
 
0.8%
-79.71222857 113
 
0.8%
-79.63864759 107
 
0.7%
-79.56936408 91
 
0.6%
-79.70175576 56
 
0.4%
-79.60455904 53
 
0.4%
-79.75938361 51
 
0.3%
-79.65320697 50
 
0.3%
-79.70562159 50
 
0.3%
Other values (4273) 13723
94.0%
ValueCountFrequency (%)
-79.80298035 1
 
< 0.1%
-79.8014612 1
 
< 0.1%
-79.79447393 1
 
< 0.1%
-79.79439767 1
 
< 0.1%
-79.78884298 1
 
< 0.1%
-79.78871792 20
0.1%
-79.78850259 1
 
< 0.1%
-79.78675536 5
 
< 0.1%
-79.78630211 12
0.1%
-79.78452433 11
0.1%
ValueCountFrequency (%)
-79.55093488 4
< 0.1%
-79.55280776 1
 
< 0.1%
-79.55341309 1
 
< 0.1%
-79.55391093 1
 
< 0.1%
-79.55445215 1
 
< 0.1%
-79.55472553 2
< 0.1%
-79.55507028 1
 
< 0.1%
-79.55523334 1
 
< 0.1%
-79.55532738 1
 
< 0.1%
-79.55542565 1
 
< 0.1%

Y
Real number (ℝ)

Distinct4283
Distinct (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.611206
Minimum43.48517
Maximum43.732864
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum43.48517
5-th percentile43.522362
Q143.578243
median43.607936
Q343.649243
95-th percentile43.698595
Maximum43.732864
Range0.24769358
Interquartile range (IQR)0.071000798

Descriptive statistics

Standard deviation0.050904188
Coefficient of variation (CV)0.0011672273
Kurtosis-0.59031452
Mean43.611206
Median Absolute Deviation (MAD)0.035283483
Skewness0.014652089
Sum636810.82
Variance0.0025912364
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
43.59351505 185
 
1.3%
43.67999884 123
 
0.8%
43.55837136 113
 
0.8%
43.72011759 107
 
0.7%
43.5935916 91
 
0.6%
43.56223751 56
 
0.4%
43.62508971 53
 
0.4%
43.58207115 51
 
0.3%
43.53041418 50
 
0.3%
43.55906534 50
 
0.3%
Other values (4273) 13723
94.0%
ValueCountFrequency (%)
43.48517014 1
< 0.1%
43.48968489 1
< 0.1%
43.4915708 1
< 0.1%
43.49199992 2
< 0.1%
43.49224252 1
< 0.1%
43.49454092 1
< 0.1%
43.49517064 1
< 0.1%
43.49608236 1
< 0.1%
43.49636475 1
< 0.1%
43.49652992 2
< 0.1%
ValueCountFrequency (%)
43.73286372 10
0.1%
43.73233211 1
 
< 0.1%
43.73196635 1
 
< 0.1%
43.73068152 1
 
< 0.1%
43.72935757 1
 
< 0.1%
43.72770692 1
 
< 0.1%
43.72552272 2
 
< 0.1%
43.72537511 1
 
< 0.1%
43.7250583 1
 
< 0.1%
43.7248112 2
 
< 0.1%

FID
Real number (ℝ)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct14602
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7301.5
Minimum1
Maximum14602
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum1
5-th percentile731.05
Q13651.25
median7301.5
Q310951.75
95-th percentile13871.95
Maximum14602
Range14601
Interquartile range (IQR)7300.5

Descriptive statistics

Standard deviation4215.3787
Coefficient of variation (CV)0.5773305
Kurtosis-1.2
Mean7301.5
Median Absolute Deviation (MAD)3650.5
Skewness0
Sum1.066165 × 108
Variance17769417
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
9726 1
 
< 0.1%
9728 1
 
< 0.1%
9729 1
 
< 0.1%
9730 1
 
< 0.1%
9731 1
 
< 0.1%
9732 1
 
< 0.1%
9733 1
 
< 0.1%
9734 1
 
< 0.1%
9735 1
 
< 0.1%
Other values (14592) 14592
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
14602 1
< 0.1%
14601 1
< 0.1%
14600 1
< 0.1%
14599 1
< 0.1%
14598 1
< 0.1%
14597 1
< 0.1%
14596 1
< 0.1%
14595 1
< 0.1%
14594 1
< 0.1%
14593 1
< 0.1%

BusinessID
Real number (ℝ)

HIGH CORRELATION
UNIQUE

Distinct14602
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30008.365
Minimum7
Maximum87134
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum7
5-th percentile2160.1
Q19073.25
median17557
Q352151.75
95-th percentile85256.9
Maximum87134
Range87127
Interquartile range (IQR)43078.5

Descriptive statistics

Standard deviation26323.079
Coefficient of variation (CV)0.87719137
Kurtosis-0.62367343
Mean30008.365
Median Absolute Deviation (MAD)12630.5
Skewness0.80472545
Sum4.3818214 × 108
Variance6.9290448 × 108
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1055 1
 
< 0.1%
45834 1
 
< 0.1%
47415 1
 
< 0.1%
45837 1
 
< 0.1%
48798 1
 
< 0.1%
23327 1
 
< 0.1%
49165 1
 
< 0.1%
46886 1
 
< 0.1%
47416 1
 
< 0.1%
47181 1
 
< 0.1%
Other values (14592) 14592
99.9%
ValueCountFrequency (%)
7 1
< 0.1%
10 1
< 0.1%
12 1
< 0.1%
16 1
< 0.1%
18 1
< 0.1%
20 1
< 0.1%
21 1
< 0.1%
23 1
< 0.1%
33 1
< 0.1%
35 1
< 0.1%
ValueCountFrequency (%)
87134 1
< 0.1%
87133 1
< 0.1%
87132 1
< 0.1%
87131 1
< 0.1%
87130 1
< 0.1%
87129 1
< 0.1%
87128 1
< 0.1%
87125 1
< 0.1%
87124 1
< 0.1%
87120 1
< 0.1%

Name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct13485
Distinct (%)92.4%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
Subway
 
40
Tim Horton's
 
39
Petro Canada
 
24
Dollarama
 
19
Shoppers Drug Mart
 
19
Other values (13480)
14461 

Length

Max length118
Median length72
Mean length22.732639
Min length2

Characters and Unicode

Total characters331942
Distinct characters83
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13000 ?
Unique (%)89.0%

Sample

1st rowGolf Trends Inc.
2nd rowApex Graphics Inc.
3rd rowSands, John & Associates Limited
4th rowPrintmedia-Tackaberry Times
5th rowS W R Industries Ltd.

Common Values

ValueCountFrequency (%)
Subway 40
 
0.3%
Tim Horton's 39
 
0.3%
Petro Canada 24
 
0.2%
Dollarama 19
 
0.1%
Shoppers Drug Mart 19
 
0.1%
Pizza Pizza 17
 
0.1%
Money Mart 14
 
0.1%
Royal Bank of Canada 14
 
0.1%
Starbucks 13
 
0.1%
McDonald's 13
 
0.1%
Other values (13475) 14390
98.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
inc 2855
 
5.5%
1789
 
3.4%
ltd 1479
 
2.8%
canada 908
 
1.7%
centre 539
 
1.0%
and 474
 
0.9%
services 469
 
0.9%
a 441
 
0.8%
the 428
 
0.8%
dr 425
 
0.8%
Other values (11540) 42520
81.3%

Most occurring characters

ValueCountFrequency (%)
37790
 
11.4%
e 24983
 
7.5%
a 24143
 
7.3%
n 21512
 
6.5%
i 19835
 
6.0%
r 19262
 
5.8%
o 18194
 
5.5%
t 17786
 
5.4%
s 14564
 
4.4%
l 11792
 
3.6%
Other values (73) 122081
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 232210
70.0%
Uppercase Letter 51325
 
15.5%
Space Separator 37790
 
11.4%
Other Punctuation 8528
 
2.6%
Dash Punctuation 821
 
0.2%
Decimal Number 736
 
0.2%
Close Punctuation 250
 
0.1%
Open Punctuation 248
 
0.1%
Math Symbol 31
 
< 0.1%
Control 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 24983
10.8%
a 24143
10.4%
n 21512
9.3%
i 19835
8.5%
r 19262
 
8.3%
o 18194
 
7.8%
t 17786
 
7.7%
s 14564
 
6.3%
l 11792
 
5.1%
c 11206
 
4.8%
Other values (17) 48933
21.1%
Uppercase Letter
ValueCountFrequency (%)
C 6644
12.9%
S 5367
 
10.5%
I 4351
 
8.5%
M 3481
 
6.8%
L 3403
 
6.6%
A 3144
 
6.1%
P 3141
 
6.1%
T 2879
 
5.6%
D 2641
 
5.1%
B 2072
 
4.0%
Other values (16) 14202
27.7%
Other Punctuation
ValueCountFrequency (%)
. 5576
65.4%
& 1392
 
16.3%
, 704
 
8.3%
' 668
 
7.8%
/ 150
 
1.8%
: 14
 
0.2%
# 6
 
0.1%
! 6
 
0.1%
" 6
 
0.1%
@ 5
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 162
22.0%
2 141
19.2%
0 131
17.8%
4 80
10.9%
3 60
 
8.2%
9 51
 
6.9%
8 35
 
4.8%
5 28
 
3.8%
7 24
 
3.3%
6 24
 
3.3%
Close Punctuation
ValueCountFrequency (%)
) 248
99.2%
] 2
 
0.8%
Math Symbol
ValueCountFrequency (%)
+ 25
80.6%
| 6
 
19.4%
Space Separator
ValueCountFrequency (%)
37790
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 821
100.0%
Open Punctuation
ValueCountFrequency (%)
( 248
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Other Symbol
ValueCountFrequency (%)
© 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 283535
85.4%
Common 48407
 
14.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 24983
 
8.8%
a 24143
 
8.5%
n 21512
 
7.6%
i 19835
 
7.0%
r 19262
 
6.8%
o 18194
 
6.4%
t 17786
 
6.3%
s 14564
 
5.1%
l 11792
 
4.2%
c 11206
 
4.0%
Other values (43) 100258
35.4%
Common
ValueCountFrequency (%)
37790
78.1%
. 5576
 
11.5%
& 1392
 
2.9%
- 821
 
1.7%
, 704
 
1.5%
' 668
 
1.4%
( 248
 
0.5%
) 248
 
0.5%
1 162
 
0.3%
/ 150
 
0.3%
Other values (20) 648
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 331887
> 99.9%
None 55
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37790
 
11.4%
e 24983
 
7.5%
a 24143
 
7.3%
n 21512
 
6.5%
i 19835
 
6.0%
r 19262
 
5.8%
o 18194
 
5.5%
t 17786
 
5.4%
s 14564
 
4.4%
l 11792
 
3.6%
Other values (71) 122026
36.8%
None
ValueCountFrequency (%)
é 54
98.2%
© 1
 
1.8%

Address
Categorical

Distinct5431
Distinct (%)37.2%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
100 City Centre Dr
 
185
7205 Goreway Dr
 
106
5100 Erin Mills Pky
 
106
1250 South Service Rd
 
91
2000 Credit Valley Rd
 
56
Other values (5426)
14058 

Length

Max length31
Median length26
Mean length16.654294
Min length5

Characters and Unicode

Total characters243186
Distinct characters64
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3632 ?
Unique (%)24.9%

Sample

1st row300 Ambassador Dr
2nd row320 Ambassador Dr
3rd row320 Ambassador Dr
4th row320 Ambassador Dr
5th row321 Ambassador Dr

Common Values

ValueCountFrequency (%)
100 City Centre Dr 185
 
1.3%
7205 Goreway Dr 106
 
0.7%
5100 Erin Mills Pky 106
 
0.7%
1250 South Service Rd 91
 
0.6%
2000 Credit Valley Rd 56
 
0.4%
4141 Dixie Rd 53
 
0.4%
2225 Erin Mills Pky 50
 
0.3%
2300 Eglinton Ave W 50
 
0.3%
377 Burnhamthorpe Rd E 45
 
0.3%
1550 South Gateway Rd 44
 
0.3%
Other values (5421) 13816
94.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
rd 5323
 
10.8%
dr 3244
 
6.6%
e 2230
 
4.5%
st 1981
 
4.0%
blvd 1468
 
3.0%
w 1441
 
2.9%
dundas 912
 
1.8%
ave 751
 
1.5%
lakeshore 511
 
1.0%
pky 491
 
1.0%
Other values (3352) 31078
62.9%

Most occurring characters

ValueCountFrequency (%)
34828
 
14.3%
r 14366
 
5.9%
e 13629
 
5.6%
a 10936
 
4.5%
d 10366
 
4.3%
0 9571
 
3.9%
n 9379
 
3.9%
t 9065
 
3.7%
5 8955
 
3.7%
i 8428
 
3.5%
Other values (54) 113663
46.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 119434
49.1%
Decimal Number 53676
22.1%
Uppercase Letter 35143
 
14.5%
Space Separator 34828
 
14.3%
Dash Punctuation 91
 
< 0.1%
Other Punctuation 11
 
< 0.1%
Modifier Symbol 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 14366
12.0%
e 13629
11.4%
a 10936
9.2%
d 10366
8.7%
n 9379
 
7.9%
t 9065
 
7.6%
i 8428
 
7.1%
o 6941
 
5.8%
l 6102
 
5.1%
s 5190
 
4.3%
Other values (15) 25032
21.0%
Uppercase Letter
ValueCountFrequency (%)
R 5889
16.8%
D 5292
15.1%
S 3653
10.4%
E 3086
8.8%
B 2678
7.6%
C 2520
7.2%
W 2270
 
6.5%
M 1776
 
5.1%
A 1700
 
4.8%
T 1175
 
3.3%
Other values (14) 5104
14.5%
Decimal Number
ValueCountFrequency (%)
0 9571
17.8%
5 8955
16.7%
1 7871
14.7%
2 5821
10.8%
3 4798
8.9%
6 4332
8.1%
7 3818
 
7.1%
4 3220
 
6.0%
9 2680
 
5.0%
8 2610
 
4.9%
Other Punctuation
ValueCountFrequency (%)
' 10
90.9%
. 1
 
9.1%
Space Separator
ValueCountFrequency (%)
34828
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 91
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 154577
63.6%
Common 88609
36.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 14366
 
9.3%
e 13629
 
8.8%
a 10936
 
7.1%
d 10366
 
6.7%
n 9379
 
6.1%
t 9065
 
5.9%
i 8428
 
5.5%
o 6941
 
4.5%
l 6102
 
3.9%
R 5889
 
3.8%
Other values (39) 59476
38.5%
Common
ValueCountFrequency (%)
34828
39.3%
0 9571
 
10.8%
5 8955
 
10.1%
1 7871
 
8.9%
2 5821
 
6.6%
3 4798
 
5.4%
6 4332
 
4.9%
7 3818
 
4.3%
4 3220
 
3.6%
9 2680
 
3.0%
Other values (5) 2715
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 243186
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
34828
 
14.3%
r 14366
 
5.9%
e 13629
 
5.6%
a 10936
 
4.5%
d 10366
 
4.3%
0 9571
 
3.9%
n 9379
 
3.9%
t 9065
 
3.7%
5 8955
 
3.7%
i 8428
 
3.5%
Other values (54) 113663
46.7%

StreetNo
Real number (ℝ)

Distinct2763
Distinct (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2898.7194
Minimum1
Maximum7895
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum1
5-th percentile60
Q11031
median2355
Q35100
95-th percentile7065
Maximum7895
Range7894
Interquartile range (IQR)4069

Descriptive statistics

Standard deviation2337.1037
Coefficient of variation (CV)0.80625386
Kurtosis-1.0013516
Mean2898.7194
Median Absolute Deviation (MAD)1635
Skewness0.55271973
Sum42327100
Variance5462053.6
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 214
 
1.5%
5100 121
 
0.8%
7205 114
 
0.8%
1250 99
 
0.7%
2000 93
 
0.6%
1 79
 
0.5%
1550 61
 
0.4%
4141 61
 
0.4%
1100 57
 
0.4%
50 56
 
0.4%
Other values (2753) 13647
93.5%
ValueCountFrequency (%)
1 79
0.5%
2 24
 
0.2%
3 46
0.3%
4 28
 
0.2%
5 1
 
< 0.1%
6 6
 
< 0.1%
7 5
 
< 0.1%
8 3
 
< 0.1%
9 4
 
< 0.1%
10 29
 
0.2%
ValueCountFrequency (%)
7895 28
0.2%
7890 1
 
< 0.1%
7885 12
0.1%
7880 2
 
< 0.1%
7875 4
 
< 0.1%
7860 1
 
< 0.1%
7855 1
 
< 0.1%
7850 1
 
< 0.1%
7830 1
 
< 0.1%
7825 1
 
< 0.1%

StreetName
Categorical

Distinct572
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
Dundas St E
 
585
Hurontario St
 
437
Matheson Blvd E
 
386
Dixie Rd
 
366
Lakeshore Rd E
 
339
Other values (567)
12489 

Length

Max length26
Median length21
Mean length11.977058
Min length3

Characters and Unicode

Total characters174889
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique154 ?
Unique (%)1.1%

Sample

1st rowAmbassador Dr
2nd rowAmbassador Dr
3rd rowAmbassador Dr
4th rowAmbassador Dr
5th rowAmbassador Dr

Common Values

ValueCountFrequency (%)
Dundas St E 585
 
4.0%
Hurontario St 437
 
3.0%
Matheson Blvd E 386
 
2.6%
Dixie Rd 366
 
2.5%
Lakeshore Rd E 339
 
2.3%
Dundas St W 325
 
2.2%
City Centre Dr 282
 
1.9%
Britannia Rd E 245
 
1.7%
Tomken Rd 241
 
1.7%
Argentia Rd 240
 
1.6%
Other values (562) 11156
76.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
rd 5323
 
15.3%
dr 3243
 
9.3%
e 2230
 
6.4%
st 1981
 
5.7%
blvd 1468
 
4.2%
w 1441
 
4.1%
dundas 912
 
2.6%
ave 751
 
2.2%
lakeshore 511
 
1.5%
pky 491
 
1.4%
Other values (588) 16476
47.3%

Most occurring characters

ValueCountFrequency (%)
20225
 
11.6%
r 14353
 
8.2%
e 13629
 
7.8%
a 10936
 
6.3%
d 10365
 
5.9%
n 9379
 
5.4%
t 9064
 
5.2%
i 8428
 
4.8%
o 6941
 
4.0%
l 6102
 
3.5%
Other values (43) 65467
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 119418
68.3%
Uppercase Letter 35144
 
20.1%
Space Separator 20225
 
11.6%
Dash Punctuation 91
 
0.1%
Other Punctuation 11
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 14353
12.0%
e 13629
11.4%
a 10936
9.2%
d 10365
8.7%
n 9379
 
7.9%
t 9064
 
7.6%
i 8428
 
7.1%
o 6941
 
5.8%
l 6102
 
5.1%
s 5190
 
4.3%
Other values (15) 25031
21.0%
Uppercase Letter
ValueCountFrequency (%)
R 5889
16.8%
D 5292
15.1%
S 3653
10.4%
E 3086
8.8%
B 2678
7.6%
C 2520
7.2%
W 2270
 
6.5%
M 1776
 
5.1%
A 1700
 
4.8%
T 1176
 
3.3%
Other values (14) 5104
14.5%
Other Punctuation
ValueCountFrequency (%)
' 10
90.9%
. 1
 
9.1%
Space Separator
ValueCountFrequency (%)
20225
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 91
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 154562
88.4%
Common 20327
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 14353
 
9.3%
e 13629
 
8.8%
a 10936
 
7.1%
d 10365
 
6.7%
n 9379
 
6.1%
t 9064
 
5.9%
i 8428
 
5.5%
o 6941
 
4.5%
l 6102
 
3.9%
R 5889
 
3.8%
Other values (39) 59476
38.5%
Common
ValueCountFrequency (%)
20225
99.5%
- 91
 
0.4%
' 10
 
< 0.1%
. 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 174889
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
20225
 
11.6%
r 14353
 
8.2%
e 13629
 
7.8%
a 10936
 
6.3%
d 10365
 
5.9%
n 9379
 
5.4%
t 9064
 
5.2%
i 8428
 
4.8%
o 6941
 
4.0%
l 6102
 
3.5%
Other values (43) 65467
37.4%

BldgNo
Categorical

Distinct46
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
13748 
Bldg 2
 
189
Bldg 1
 
168
Bldg A
 
91
Bldg B
 
76
Other values (41)
 
330

Length

Max length14
Median length1
Mean length1.3000959
Min length1

Characters and Unicode

Total characters18984
Distinct characters41
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
13748
94.2%
Bldg 2 189
 
1.3%
Bldg 1 168
 
1.2%
Bldg A 91
 
0.6%
Bldg B 76
 
0.5%
Bldg 3 58
 
0.4%
Bldg 4 49
 
0.3%
Bldg K 32
 
0.2%
Bldg C 17
 
0.1%
Plaza 1 15
 
0.1%
Other values (36) 159
 
1.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
bldg 772
45.6%
2 194
 
11.5%
1 184
 
10.9%
a 94
 
5.6%
b 78
 
4.6%
3 63
 
3.7%
4 61
 
3.6%
plaza 37
 
2.2%
k 32
 
1.9%
tower 21
 
1.2%
Other values (27) 156
 
9.2%

Most occurring characters

ValueCountFrequency (%)
14586
76.8%
B 861
 
4.5%
l 812
 
4.3%
g 781
 
4.1%
d 777
 
4.1%
1 214
 
1.1%
2 205
 
1.1%
A 95
 
0.5%
a 92
 
0.5%
3 63
 
0.3%
Other values (31) 498
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Space Separator 14586
76.8%
Lowercase Letter 2676
 
14.1%
Uppercase Letter 1139
 
6.0%
Decimal Number 583
 
3.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 861
75.6%
A 95
 
8.3%
P 37
 
3.2%
K 32
 
2.8%
E 22
 
1.9%
T 21
 
1.8%
C 18
 
1.6%
D 14
 
1.2%
W 10
 
0.9%
F 8
 
0.7%
Other values (6) 21
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
l 812
30.3%
g 781
29.2%
d 777
29.0%
a 92
 
3.4%
e 40
 
1.5%
z 37
 
1.4%
r 31
 
1.2%
t 27
 
1.0%
o 24
 
0.9%
s 21
 
0.8%
Other values (4) 34
 
1.3%
Decimal Number
ValueCountFrequency (%)
1 214
36.7%
2 205
35.2%
3 63
 
10.8%
4 61
 
10.5%
7 9
 
1.5%
6 7
 
1.2%
5 7
 
1.2%
0 6
 
1.0%
9 6
 
1.0%
8 5
 
0.9%
Space Separator
ValueCountFrequency (%)
14586
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15169
79.9%
Latin 3815
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 861
22.6%
l 812
21.3%
g 781
20.5%
d 777
20.4%
A 95
 
2.5%
a 92
 
2.4%
e 40
 
1.0%
P 37
 
1.0%
z 37
 
1.0%
K 32
 
0.8%
Other values (20) 251
 
6.6%
Common
ValueCountFrequency (%)
14586
96.2%
1 214
 
1.4%
2 205
 
1.4%
3 63
 
0.4%
4 61
 
0.4%
7 9
 
0.1%
6 7
 
< 0.1%
5 7
 
< 0.1%
0 6
 
< 0.1%
9 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 18984
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14586
76.8%
B 861
 
4.5%
l 812
 
4.3%
g 781
 
4.1%
d 777
 
4.1%
1 214
 
1.1%
2 205
 
1.1%
A 95
 
0.5%
a 92
 
0.5%
3 63
 
0.3%
Other values (31) 498
 
2.6%

UnitNo
Categorical

Distinct1721
Distinct (%)11.8%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
4281 
1
 
505
2
 
412
3
 
344
4
 
332
Other values (1716)
8728 

Length

Max length32
Median length1
Mean length2.4784961
Min length1

Characters and Unicode

Total characters36191
Distinct characters66
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1112 ?
Unique (%)7.6%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
4281
29.3%
1 505
 
3.5%
2 412
 
2.8%
3 344
 
2.4%
4 332
 
2.3%
5 294
 
2.0%
6 282
 
1.9%
7 239
 
1.6%
8 225
 
1.5%
9 200
 
1.4%
Other values (1711) 7488
51.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
to 1498
 
10.7%
1 768
 
5.5%
2 630
 
4.5%
3 523
 
3.7%
4 483
 
3.4%
5 449
 
3.2%
6 421
 
3.0%
7 382
 
2.7%
8 364
 
2.6%
9 320
 
2.3%
Other values (1019) 8203
58.4%

Most occurring characters

ValueCountFrequency (%)
8148
22.5%
1 5489
15.2%
2 3559
9.8%
0 3425
9.5%
3 1950
 
5.4%
o 1688
 
4.7%
t 1575
 
4.4%
4 1573
 
4.3%
5 1379
 
3.8%
6 1152
 
3.2%
Other values (56) 6253
17.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 21176
58.5%
Space Separator 8148
 
22.5%
Lowercase Letter 4260
 
11.8%
Uppercase Letter 2115
 
5.8%
Other Punctuation 456
 
1.3%
Open Punctuation 17
 
< 0.1%
Close Punctuation 17
 
< 0.1%
Math Symbol 1
 
< 0.1%
Control 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 1688
39.6%
t 1575
37.0%
l 160
 
3.8%
r 154
 
3.6%
e 140
 
3.3%
n 87
 
2.0%
s 81
 
1.9%
a 73
 
1.7%
d 68
 
1.6%
p 50
 
1.2%
Other values (13) 184
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
A 570
27.0%
B 460
21.7%
C 209
 
9.9%
F 169
 
8.0%
E 128
 
6.1%
D 100
 
4.7%
H 87
 
4.1%
G 69
 
3.3%
L 67
 
3.2%
K 34
 
1.6%
Other values (13) 222
 
10.5%
Decimal Number
ValueCountFrequency (%)
1 5489
25.9%
2 3559
16.8%
0 3425
16.2%
3 1950
 
9.2%
4 1573
 
7.4%
5 1379
 
6.5%
6 1152
 
5.4%
7 985
 
4.7%
8 933
 
4.4%
9 731
 
3.5%
Other Punctuation
ValueCountFrequency (%)
, 270
59.2%
& 177
38.8%
/ 4
 
0.9%
. 4
 
0.9%
1
 
0.2%
Space Separator
ValueCountFrequency (%)
8148
100.0%
Open Punctuation
ValueCountFrequency (%)
( 17
100.0%
Close Punctuation
ValueCountFrequency (%)
) 17
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 29816
82.4%
Latin 6375
 
17.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 1688
26.5%
t 1575
24.7%
A 570
 
8.9%
B 460
 
7.2%
C 209
 
3.3%
F 169
 
2.7%
l 160
 
2.5%
r 154
 
2.4%
e 140
 
2.2%
E 128
 
2.0%
Other values (36) 1122
17.6%
Common
ValueCountFrequency (%)
8148
27.3%
1 5489
18.4%
2 3559
11.9%
0 3425
11.5%
3 1950
 
6.5%
4 1573
 
5.3%
5 1379
 
4.6%
6 1152
 
3.9%
7 985
 
3.3%
8 933
 
3.1%
Other values (10) 1223
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36190
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8148
22.5%
1 5489
15.2%
2 3559
9.8%
0 3425
9.5%
3 1950
 
5.4%
o 1688
 
4.7%
t 1575
 
4.4%
4 1573
 
4.3%
5 1379
 
3.8%
6 1152
 
3.2%
Other values (55) 6252
17.3%
Punctuation
ValueCountFrequency (%)
1
100.0%

PostalCode
Categorical

Distinct2497
Distinct (%)17.1%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
L5B 1M7
 
194
L5M 4Z5
 
106
L4T 2T9
 
106
L5E 1V4
 
91
L5P 1B2
 
82
Other values (2492)
14023 

Length

Max length7
Median length7
Mean length6.9893165
Min length1

Characters and Unicode

Total characters102058
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique825 ?
Unique (%)5.6%

Sample

1st rowL5T 2J3
2nd rowL5T 2J3
3rd rowL5T 2J3
4th rowL5T 2J3
5th rowL5T 2J3

Common Values

ValueCountFrequency (%)
L5B 1M7 194
 
1.3%
L5M 4Z5 106
 
0.7%
L4T 2T9 106
 
0.7%
L5E 1V4 91
 
0.6%
L5P 1B2 82
 
0.6%
L5M 4N4 56
 
0.4%
L5C 1V8 55
 
0.4%
L4W 1V5 53
 
0.4%
L5J 1K5 53
 
0.4%
L5M 1K8 50
 
0.3%
Other values (2487) 13756
94.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
l4w 2219
 
7.6%
l5t 1462
 
5.0%
l5n 1107
 
3.8%
l5b 918
 
3.1%
l5l 908
 
3.1%
l4z 889
 
3.1%
l5m 803
 
2.8%
l5s 743
 
2.5%
l5a 685
 
2.4%
l4t 614
 
2.1%
Other values (970) 18796
64.5%

Most occurring characters

ValueCountFrequency (%)
L 16184
15.9%
14594
14.3%
5 12080
11.8%
4 8651
 
8.5%
1 7541
 
7.4%
2 4723
 
4.6%
3 3100
 
3.0%
W 2912
 
2.9%
T 2650
 
2.6%
6 2100
 
2.1%
Other values (25) 27523
27.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 43736
42.9%
Decimal Number 43725
42.8%
Space Separator 14594
 
14.3%
Lowercase Letter 3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 16184
37.0%
W 2912
 
6.7%
T 2650
 
6.1%
A 1794
 
4.1%
N 1772
 
4.1%
M 1707
 
3.9%
B 1675
 
3.8%
Z 1559
 
3.6%
V 1491
 
3.4%
C 1368
 
3.1%
Other values (11) 10624
24.3%
Decimal Number
ValueCountFrequency (%)
5 12080
27.6%
4 8651
19.8%
1 7541
17.2%
2 4723
 
10.8%
3 3100
 
7.1%
6 2100
 
4.8%
8 1800
 
4.1%
7 1749
 
4.0%
9 1547
 
3.5%
0 434
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
c 1
33.3%
k 1
33.3%
l 1
33.3%
Space Separator
ValueCountFrequency (%)
14594
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 58319
57.1%
Latin 43739
42.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 16184
37.0%
W 2912
 
6.7%
T 2650
 
6.1%
A 1794
 
4.1%
N 1772
 
4.1%
M 1707
 
3.9%
B 1675
 
3.8%
Z 1559
 
3.6%
V 1491
 
3.4%
C 1368
 
3.1%
Other values (14) 10627
24.3%
Common
ValueCountFrequency (%)
14594
25.0%
5 12080
20.7%
4 8651
14.8%
1 7541
12.9%
2 4723
 
8.1%
3 3100
 
5.3%
6 2100
 
3.6%
8 1800
 
3.1%
7 1749
 
3.0%
9 1547
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 102058
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 16184
15.9%
14594
14.3%
5 12080
11.8%
4 8651
 
8.5%
1 7541
 
7.4%
2 4723
 
4.6%
3 3100
 
3.0%
W 2912
 
2.9%
T 2650
 
2.6%
6 2100
 
2.1%
Other values (25) 27523
27.0%

Location
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct56
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
Northeast EA (West)
3858 
Gateway EA (East)
872 
Dixie EA
846 
Meadowvale Business Park CC
815 
Western Business Park EA
 
745
Other values (51)
7466 

Length

Max length27
Median length23
Mean length16.466101
Min length7

Characters and Unicode

Total characters240438
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowGateway EA (East)
2nd rowGateway EA (East)
3rd rowGateway EA (East)
4th rowGateway EA (East)
5th rowGateway EA (East)

Common Values

ValueCountFrequency (%)
Northeast EA (West) 3858
26.4%
Gateway EA (East) 872
 
6.0%
Dixie EA 846
 
5.8%
Meadowvale Business Park CC 815
 
5.6%
Western Business Park EA 745
 
5.1%
DT Core 597
 
4.1%
DT Cooksville 481
 
3.3%
Airport CC 430
 
2.9%
Mavis-Erindale EA 348
 
2.4%
Northeast EA (East) 342
 
2.3%
Other values (46) 5268
36.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
ea 7460
18.5%
northeast 4200
 
10.4%
west 4172
 
10.3%
nhd 2865
 
7.1%
park 1755
 
4.3%
east 1729
 
4.3%
business 1560
 
3.9%
cc 1457
 
3.6%
dt 1273
 
3.2%
gateway 1241
 
3.1%
Other values (45) 12636
31.3%

Most occurring characters

ValueCountFrequency (%)
25746
 
10.7%
e 21487
 
8.9%
t 20118
 
8.4%
s 18236
 
7.6%
a 15739
 
6.5%
r 12415
 
5.2%
o 11229
 
4.7%
E 10169
 
4.2%
i 9045
 
3.8%
A 8491
 
3.5%
Other values (33) 87763
36.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 144599
60.1%
Uppercase Letter 58249
24.2%
Space Separator 25746
 
10.7%
Open Punctuation 5607
 
2.3%
Close Punctuation 5607
 
2.3%
Dash Punctuation 630
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 21487
14.9%
t 20118
13.9%
s 18236
12.6%
a 15739
10.9%
r 12415
8.6%
o 11229
7.8%
i 9045
6.3%
l 6725
 
4.7%
n 5207
 
3.6%
h 5051
 
3.5%
Other values (11) 19347
13.4%
Uppercase Letter
ValueCountFrequency (%)
E 10169
17.5%
A 8491
14.6%
N 8478
14.6%
C 6795
11.7%
D 4984
8.6%
W 4917
8.4%
H 3161
 
5.4%
M 2725
 
4.7%
P 2232
 
3.8%
B 1560
 
2.7%
Other values (8) 4737
8.1%
Space Separator
ValueCountFrequency (%)
25746
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5607
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5607
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 630
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 202848
84.4%
Common 37590
 
15.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 21487
 
10.6%
t 20118
 
9.9%
s 18236
 
9.0%
a 15739
 
7.8%
r 12415
 
6.1%
o 11229
 
5.5%
E 10169
 
5.0%
i 9045
 
4.5%
A 8491
 
4.2%
N 8478
 
4.2%
Other values (29) 67441
33.2%
Common
ValueCountFrequency (%)
25746
68.5%
( 5607
 
14.9%
) 5607
 
14.9%
- 630
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 240438
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
25746
 
10.7%
e 21487
 
8.9%
t 20118
 
8.4%
s 18236
 
7.6%
a 15739
 
6.5%
r 12415
 
5.2%
o 11229
 
4.7%
E 10169
 
4.2%
i 9045
 
3.8%
A 8491
 
3.5%
Other values (33) 87763
36.5%

Ward
Real number (ℝ)

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.4300096
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q15
median5
Q37
95-th percentile11
Maximum11
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.5034319
Coefficient of variation (CV)0.46103637
Kurtosis-0.083946106
Mean5.4300096
Median Absolute Deviation (MAD)1
Skewness0.29465754
Sum79289
Variance6.2671714
MonotonicityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
5 6084
41.7%
1 1292
 
8.8%
8 1210
 
8.3%
7 1188
 
8.1%
3 909
 
6.2%
9 883
 
6.0%
11 815
 
5.6%
4 750
 
5.1%
6 700
 
4.8%
2 621
 
4.3%
ValueCountFrequency (%)
1 1292
 
8.8%
2 621
 
4.3%
3 909
 
6.2%
4 750
 
5.1%
5 6084
41.7%
6 700
 
4.8%
7 1188
 
8.1%
8 1210
 
8.3%
9 883
 
6.0%
10 150
 
1.0%
ValueCountFrequency (%)
11 815
 
5.6%
10 150
 
1.0%
9 883
 
6.0%
8 1210
 
8.3%
7 1188
 
8.1%
6 700
 
4.8%
5 6084
41.7%
4 750
 
5.1%
3 909
 
6.2%
2 621
 
4.3%

NAICSCode
Real number (ℝ)

Distinct682
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean527383.48
Minimum1
Maximum913910
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum1
5-th percentile311920
Q1417320
median523920
Q3621310
95-th percentile812116
Maximum913910
Range913909
Interquartile range (IQR)203990

Descriptive statistics

Standard deviation164909.49
Coefficient of variation (CV)0.3126937
Kurtosis-0.093223426
Mean527383.48
Median Absolute Deviation (MAD)100190
Skewness0.082088865
Sum7.7008535 × 109
Variance2.7195139 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
722512 632
 
4.3%
811111 368
 
2.5%
722511 346
 
2.4%
621110 335
 
2.3%
621210 312
 
2.1%
812115 266
 
1.8%
541110 255
 
1.7%
611110 253
 
1.7%
488519 229
 
1.6%
417230 198
 
1.4%
Other values (672) 11408
78.1%
ValueCountFrequency (%)
1 129
0.9%
44612 1
 
< 0.1%
212299 1
 
< 0.1%
213118 1
 
< 0.1%
213119 1
 
< 0.1%
221111 1
 
< 0.1%
221119 2
 
< 0.1%
221122 3
 
< 0.1%
221210 1
 
< 0.1%
221310 3
 
< 0.1%
ValueCountFrequency (%)
913910 19
0.1%
913140 20
0.1%
913130 1
 
< 0.1%
912910 6
 
< 0.1%
912210 5
 
< 0.1%
912190 2
 
< 0.1%
912150 1
 
< 0.1%
912130 1
 
< 0.1%
912120 1
 
< 0.1%
911910 7
 
< 0.1%

NAICSDescr
Categorical

Distinct20
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
Retail
2127 
Manufacturing
1810 
Wholesale
1716 
Other Services
1695 
Professional
1285 
Other values (15)
5969 

Length

Max length21
Median length14
Mean length10.8731
Min length1

Characters and Unicode

Total characters158769
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWholesale
2nd rowManufacturing
3rd rowManufacturing
4th rowManufacturing
5th rowWholesale

Common Values

ValueCountFrequency (%)
Retail 2127
14.6%
Manufacturing 1810
12.4%
Wholesale 1716
11.8%
Other Services 1695
11.6%
Professional 1285
8.8%
Health Care 1254
8.6%
Accommodation 1110
7.6%
Transportation 683
 
4.7%
Educational 553
 
3.8%
Finance 526
 
3.6%
Other values (10) 1843
12.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
retail 2127
11.9%
manufacturing 1810
10.2%
wholesale 1716
9.6%
other 1695
9.5%
services 1695
9.5%
professional 1285
7.2%
health 1254
 
7.0%
care 1254
 
7.0%
accommodation 1110
 
6.2%
transportation 683
 
3.8%
Other values (14) 3176
17.8%

Most occurring characters

ValueCountFrequency (%)
a 16828
 
10.6%
e 16261
 
10.2%
t 12971
 
8.2%
i 12121
 
7.6%
n 10963
 
6.9%
o 10759
 
6.8%
r 10409
 
6.6%
l 9041
 
5.7%
s 8179
 
5.2%
c 7354
 
4.6%
Other values (25) 43883
27.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 137503
86.6%
Uppercase Letter 17805
 
11.2%
Space Separator 3461
 
2.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 16828
12.2%
e 16261
11.8%
t 12971
9.4%
i 12121
8.8%
n 10963
8.0%
o 10759
7.8%
r 10409
7.6%
l 9041
 
6.6%
s 8179
 
5.9%
c 7354
 
5.3%
Other values (10) 22617
16.4%
Uppercase Letter
ValueCountFrequency (%)
R 2433
13.7%
M 1910
10.7%
A 1830
10.3%
C 1730
9.7%
W 1716
9.6%
O 1695
9.5%
S 1695
9.5%
P 1362
7.6%
H 1254
7.0%
E 859
 
4.8%
Other values (4) 1321
7.4%
Space Separator
ValueCountFrequency (%)
3461
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 155308
97.8%
Common 3461
 
2.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 16828
10.8%
e 16261
10.5%
t 12971
 
8.4%
i 12121
 
7.8%
n 10963
 
7.1%
o 10759
 
6.9%
r 10409
 
6.7%
l 9041
 
5.8%
s 8179
 
5.3%
c 7354
 
4.7%
Other values (24) 40422
26.0%
Common
ValueCountFrequency (%)
3461
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 158769
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 16828
 
10.6%
e 16261
 
10.2%
t 12971
 
8.2%
i 12121
 
7.6%
n 10963
 
6.9%
o 10759
 
6.8%
r 10409
 
6.6%
l 9041
 
5.7%
s 8179
 
5.2%
c 7354
 
4.6%
Other values (25) 43883
27.6%

NAICSTitle
Categorical

Distinct681
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
Limited-service eating places
 
632
General Automotive Repair
 
368
Full-service restaurants
 
346
Offices of Physicians
 
335
Offices of Dentists
 
312
Other values (676)
12609 

Length

Max length162
Median length70
Mean length36.517121
Min length6

Characters and Unicode

Total characters533223
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique103 ?
Unique (%)0.7%

Sample

1st rowAmusement and Sporting Goods Wholesaler-Distributors
2nd rowSupport Activities for Printing
3rd rowSupport Activities for Printing
4th rowOther Printing
5th rowIndustrial Machinery, Equipment and Supplies Wholesaler-Distributors

Common Values

ValueCountFrequency (%)
Limited-service eating places 632
 
4.3%
General Automotive Repair 368
 
2.5%
Full-service restaurants 346
 
2.4%
Offices of Physicians 335
 
2.3%
Offices of Dentists 312
 
2.1%
Beauty Salons 266
 
1.8%
Offices of Lawyers 255
 
1.7%
Elementary and Secondary Schools 253
 
1.7%
Other Freight Transportation Arrangement 229
 
1.6%
Industrial Machinery, Equipment and Supplies Wholesaler-Distributors 198
 
1.4%
Other values (671) 11408
78.1%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
and 6565
 
10.2%
other 3398
 
5.3%
stores 1829
 
2.9%
offices 1664
 
2.6%
services 1655
 
2.6%
wholesaler-distributors 1644
 
2.6%
of 1605
 
2.5%
all 1448
 
2.3%
manufacturing 1319
 
2.1%
supplies 934
 
1.5%
Other values (933) 42046
65.6%

Most occurring characters

ValueCountFrequency (%)
e 54191
 
10.2%
49725
 
9.3%
i 38146
 
7.2%
r 36622
 
6.9%
n 35258
 
6.6%
t 34849
 
6.5%
a 34396
 
6.5%
s 31433
 
5.9%
o 27062
 
5.1%
l 21945
 
4.1%
Other values (51) 169596
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 423686
79.5%
Uppercase Letter 52055
 
9.8%
Space Separator 49917
 
9.4%
Dash Punctuation 3509
 
0.7%
Other Punctuation 2232
 
0.4%
Open Punctuation 912
 
0.2%
Close Punctuation 771
 
0.1%
Control 141
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 54191
12.8%
i 38146
9.0%
r 36622
8.6%
n 35258
 
8.3%
t 34849
 
8.2%
a 34396
 
8.1%
s 31433
 
7.4%
o 27062
 
6.4%
l 21945
 
5.2%
c 20012
 
4.7%
Other values (16) 89772
21.2%
Uppercase Letter
ValueCountFrequency (%)
S 7472
14.4%
O 5693
10.9%
C 4717
 
9.1%
A 4492
 
8.6%
M 4098
 
7.9%
P 3597
 
6.9%
D 3002
 
5.8%
W 2495
 
4.8%
E 2258
 
4.3%
F 2031
 
3.9%
Other values (15) 12200
23.4%
Other Punctuation
ValueCountFrequency (%)
, 1826
81.8%
' 160
 
7.2%
. 154
 
6.9%
& 92
 
4.1%
Space Separator
ValueCountFrequency (%)
49725
99.6%
  192
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 3509
100.0%
Open Punctuation
ValueCountFrequency (%)
( 912
100.0%
Close Punctuation
ValueCountFrequency (%)
) 771
100.0%
Control
ValueCountFrequency (%)
141
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 475741
89.2%
Common 57482
 
10.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 54191
 
11.4%
i 38146
 
8.0%
r 36622
 
7.7%
n 35258
 
7.4%
t 34849
 
7.3%
a 34396
 
7.2%
s 31433
 
6.6%
o 27062
 
5.7%
l 21945
 
4.6%
c 20012
 
4.2%
Other values (41) 141827
29.8%
Common
ValueCountFrequency (%)
49725
86.5%
- 3509
 
6.1%
, 1826
 
3.2%
( 912
 
1.6%
) 771
 
1.3%
  192
 
0.3%
' 160
 
0.3%
. 154
 
0.3%
141
 
0.2%
& 92
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 533031
> 99.9%
None 192
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 54191
 
10.2%
49725
 
9.3%
i 38146
 
7.2%
r 36622
 
6.9%
n 35258
 
6.6%
t 34849
 
6.5%
a 34396
 
6.5%
s 31433
 
5.9%
o 27062
 
5.1%
l 21945
 
4.1%
Other values (50) 169404
31.8%
None
ValueCountFrequency (%)
  192
100.0%

Phone
Categorical

Distinct14000
Distinct (%)95.9%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
 
122
000-000-0000
 
35
905-624-3811
 
7
905-615-3200
 
6
905-677-9354
 
5
Other values (13995)
14427 

Length

Max length12
Median length12
Mean length11.908095
Min length1

Characters and Unicode

Total characters173882
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13604 ?
Unique (%)93.2%

Sample

1st row905-795-8900
2nd row905-795-9575
3rd row905-795-9519
4th row905-564-8121
5th row905-564-8080

Common Values

ValueCountFrequency (%)
122
 
0.8%
000-000-0000 35
 
0.2%
905-624-3811 7
 
< 0.1%
905-615-3200 6
 
< 0.1%
905-677-9354 5
 
< 0.1%
905-615-4750 4
 
< 0.1%
905-625-9505 4
 
< 0.1%
905-670-4070 4
 
< 0.1%
905-615-3777 4
 
< 0.1%
905-795-9380 4
 
< 0.1%
Other values (13990) 14407
98.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
000-000-0000 35
 
0.2%
905-624-3811 7
 
< 0.1%
905-615-3200 6
 
< 0.1%
905-677-9354 5
 
< 0.1%
905-615-3777 4
 
< 0.1%
905-615-4640 4
 
< 0.1%
905-615-4653 4
 
< 0.1%
905-795-9380 4
 
< 0.1%
905-670-4070 4
 
< 0.1%
905-625-9505 4
 
< 0.1%
Other values (13989) 14403
99.5%

Most occurring characters

ValueCountFrequency (%)
- 28960
16.7%
0 26321
15.1%
5 22325
12.8%
9 21802
12.5%
2 13488
7.8%
6 13113
7.5%
8 11485
 
6.6%
7 11322
 
6.5%
1 9247
 
5.3%
4 8521
 
4.9%
Other values (2) 7298
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 144800
83.3%
Dash Punctuation 28960
 
16.7%
Space Separator 122
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 26321
18.2%
5 22325
15.4%
9 21802
15.1%
2 13488
9.3%
6 13113
9.1%
8 11485
7.9%
7 11322
7.8%
1 9247
 
6.4%
4 8521
 
5.9%
3 7176
 
5.0%
Dash Punctuation
ValueCountFrequency (%)
- 28960
100.0%
Space Separator
ValueCountFrequency (%)
122
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 173882
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 28960
16.7%
0 26321
15.1%
5 22325
12.8%
9 21802
12.5%
2 13488
7.8%
6 13113
7.5%
8 11485
 
6.6%
7 11322
 
6.5%
1 9247
 
5.3%
4 8521
 
4.9%
Other values (2) 7298
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 173882
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 28960
16.7%
0 26321
15.1%
5 22325
12.8%
9 21802
12.5%
2 13488
7.8%
6 13113
7.5%
8 11485
 
6.6%
7 11322
 
6.5%
1 9247
 
5.3%
4 8521
 
4.9%
Other values (2) 7298
 
4.2%

Fax
Categorical

Distinct9400
Distinct (%)64.4%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
4891 
905-822-2673
 
9
905-361-6401
 
8
905-896-9380
 
7
905-890-1959
 
4
Other values (9395)
9683 

Length

Max length12
Median length12
Mean length8.3155047
Min length1

Characters and Unicode

Total characters121423
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9127 ?
Unique (%)62.5%

Sample

1st row905-795-8988
2nd row905-795-8775
3rd row905-795-8775
4th row905-564-7395
5th row905-564-5003

Common Values

ValueCountFrequency (%)
4891
33.5%
905-822-2673 9
 
0.1%
905-361-6401 8
 
0.1%
905-896-9380 7
 
< 0.1%
905-890-1959 4
 
< 0.1%
905-502-6982 4
 
< 0.1%
905-795-9381 4
 
< 0.1%
905-607-9204 4
 
< 0.1%
905-625-8245 3
 
< 0.1%
905-271-0376 3
 
< 0.1%
Other values (9390) 9665
66.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
905-822-2673 9
 
0.1%
905-361-6401 8
 
0.1%
905-896-9380 7
 
0.1%
905-890-1959 4
 
< 0.1%
905-502-6982 4
 
< 0.1%
905-795-9381 4
 
< 0.1%
905-607-9204 4
 
< 0.1%
905-625-4815 3
 
< 0.1%
905-279-0023 3
 
< 0.1%
905-670-3436 3
 
< 0.1%
Other values (9389) 9662
99.5%

Most occurring characters

ValueCountFrequency (%)
- 19422
16.0%
0 15950
13.1%
5 15664
12.9%
9 15194
12.5%
6 9361
7.7%
2 8884
7.3%
8 7927
6.5%
7 7560
 
6.2%
1 6026
 
5.0%
4 5498
 
4.5%
Other values (2) 9937
8.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 97110
80.0%
Dash Punctuation 19422
 
16.0%
Space Separator 4891
 
4.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 15950
16.4%
5 15664
16.1%
9 15194
15.6%
6 9361
9.6%
2 8884
9.1%
8 7927
8.2%
7 7560
7.8%
1 6026
 
6.2%
4 5498
 
5.7%
3 5046
 
5.2%
Dash Punctuation
ValueCountFrequency (%)
- 19422
100.0%
Space Separator
ValueCountFrequency (%)
4891
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 121423
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 19422
16.0%
0 15950
13.1%
5 15664
12.9%
9 15194
12.5%
6 9361
7.7%
2 8884
7.3%
8 7927
6.5%
7 7560
 
6.2%
1 6026
 
5.0%
4 5498
 
4.5%
Other values (2) 9937
8.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 121423
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 19422
16.0%
0 15950
13.1%
5 15664
12.9%
9 15194
12.5%
6 9361
7.7%
2 8884
7.3%
8 7927
6.5%
7 7560
 
6.2%
1 6026
 
5.0%
4 5498
 
4.5%
Other values (2) 9937
8.2%

TollFree
Categorical

Distinct1952
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
12602 
1-888-509-8455
 
4
1-800-769-2511
 
4
1-800-465-2422
 
4
1-800-472-6842
 
4
Other values (1947)
1984 

Length

Max length14
Median length1
Mean length2.780578
Min length1

Characters and Unicode

Total characters40602
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1914 ?
Unique (%)13.1%

Sample

1st row1-800-668-1101
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
12602
86.3%
1-888-509-8455 4
 
< 0.1%
1-800-769-2511 4
 
< 0.1%
1-800-465-2422 4
 
< 0.1%
1-800-472-6842 4
 
< 0.1%
1-866-567-8888 3
 
< 0.1%
1-800-387-1282 3
 
< 0.1%
1-800-567-7800 3
 
< 0.1%
1-877-777-8672 3
 
< 0.1%
1-800-563-4327 2
 
< 0.1%
Other values (1942) 1970
 
13.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1-888-509-8455 4
 
0.2%
1-800-465-2422 4
 
0.2%
1-800-472-6842 4
 
0.2%
1-800-769-2511 4
 
0.2%
1-866-567-8888 3
 
0.1%
1-800-387-1282 3
 
0.1%
1-800-567-7800 3
 
0.1%
1-877-777-8672 3
 
0.1%
1-800-895-8897 2
 
0.1%
1-800-509-9153 2
 
0.1%
Other values (1941) 1968
98.4%

Most occurring characters

ValueCountFrequency (%)
12602
31.0%
- 6000
14.8%
8 4274
 
10.5%
1 2978
 
7.3%
0 2626
 
6.5%
6 2484
 
6.1%
7 2210
 
5.4%
2 1724
 
4.2%
5 1707
 
4.2%
3 1494
 
3.7%
Other values (2) 2503
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 22000
54.2%
Space Separator 12602
31.0%
Dash Punctuation 6000
 
14.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8 4274
19.4%
1 2978
13.5%
0 2626
11.9%
6 2484
11.3%
7 2210
10.0%
2 1724
7.8%
5 1707
 
7.8%
3 1494
 
6.8%
4 1323
 
6.0%
9 1180
 
5.4%
Space Separator
ValueCountFrequency (%)
12602
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 40602
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12602
31.0%
- 6000
14.8%
8 4274
 
10.5%
1 2978
 
7.3%
0 2626
 
6.5%
6 2484
 
6.1%
7 2210
 
5.4%
2 1724
 
4.2%
5 1707
 
4.2%
3 1494
 
3.7%
Other values (2) 2503
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 40602
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12602
31.0%
- 6000
14.8%
8 4274
 
10.5%
1 2978
 
7.3%
0 2626
 
6.5%
6 2484
 
6.1%
7 2210
 
5.4%
2 1724
 
4.2%
5 1707
 
4.2%
3 1494
 
3.7%
Other values (2) 2503
 
6.2%

EMail
Categorical

Distinct8264
Distinct (%)56.6%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
6200 
info@publicstoragecanada.com
 
5
info@ucmas.ca
 
4
customerservice@premier-gift.com
 
3
cyclone@cyclonemfg.com
 
3
Other values (8259)
8387 

Length

Max length50
Median length46
Mean length13.322216
Min length1

Characters and Unicode

Total characters194531
Distinct characters73
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8139 ?
Unique (%)55.7%

Sample

1st rowlfinch@golftrendsinc.com
2nd rowprepress@apexgraphics.com
3rd row
4th rowinfo@printmedia.ca
5th rowshsieh@swrltd.com

Common Values

ValueCountFrequency (%)
6200
42.5%
info@publicstoragecanada.com 5
 
< 0.1%
info@ucmas.ca 4
 
< 0.1%
customerservice@premier-gift.com 3
 
< 0.1%
cyclone@cyclonemfg.com 3
 
< 0.1%
millertrailers@rogers.com 3
 
< 0.1%
info@superchoicecarpet.ca 3
 
< 0.1%
info@yogurtys.com 3
 
< 0.1%
customerservice@csr.payless.com 3
 
< 0.1%
ktc.ca.info@kapsch.net 3
 
< 0.1%
Other values (8254) 8372
57.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
info@publicstoragecanada.com 5
 
0.1%
info@ucmas.ca 4
 
< 0.1%
customerservice@premier-gift.com 3
 
< 0.1%
millertrailers@rogers.com 3
 
< 0.1%
info@superchoicecarpet.ca 3
 
< 0.1%
info@yogurtys.com 3
 
< 0.1%
customerservice@csr.payless.com 3
 
< 0.1%
ktc.ca.info@kapsch.net 3
 
< 0.1%
info@realfruitbubbletea.com 3
 
< 0.1%
info@akaloptical.com 3
 
< 0.1%
Other values (8261) 8379
99.6%

Most occurring characters

ValueCountFrequency (%)
o 17531
 
9.0%
a 17022
 
8.8%
c 14678
 
7.5%
e 13094
 
6.7%
i 12936
 
6.6%
n 11347
 
5.8%
m 10951
 
5.6%
s 10407
 
5.3%
r 9544
 
4.9%
. 9175
 
4.7%
Other values (63) 67846
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 168332
86.5%
Other Punctuation 17580
 
9.0%
Space Separator 6209
 
3.2%
Decimal Number 1770
 
0.9%
Dash Punctuation 337
 
0.2%
Uppercase Letter 162
 
0.1%
Connector Punctuation 138
 
0.1%
Control 2
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 17531
10.4%
a 17022
10.1%
c 14678
 
8.7%
e 13094
 
7.8%
i 12936
 
7.7%
n 11347
 
6.7%
m 10951
 
6.5%
s 10407
 
6.2%
r 9544
 
5.7%
t 8956
 
5.3%
Other values (16) 41866
24.9%
Uppercase Letter
ValueCountFrequency (%)
M 18
 
11.1%
S 15
 
9.3%
C 11
 
6.8%
H 11
 
6.8%
T 10
 
6.2%
D 9
 
5.6%
P 9
 
5.6%
G 8
 
4.9%
K 8
 
4.9%
A 8
 
4.9%
Other values (13) 55
34.0%
Decimal Number
ValueCountFrequency (%)
1 300
16.9%
0 297
16.8%
2 274
15.5%
3 168
9.5%
5 141
8.0%
4 133
7.5%
7 123
6.9%
6 122
6.9%
8 119
 
6.7%
9 93
 
5.3%
Other Punctuation
ValueCountFrequency (%)
. 9175
52.2%
@ 8392
47.7%
· 5
 
< 0.1%
/ 3
 
< 0.1%
# 1
 
< 0.1%
, 1
 
< 0.1%
' 1
 
< 0.1%
: 1
 
< 0.1%
& 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6209
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 337
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 138
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 168494
86.6%
Common 26037
 
13.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 17531
10.4%
a 17022
10.1%
c 14678
 
8.7%
e 13094
 
7.8%
i 12936
 
7.7%
n 11347
 
6.7%
m 10951
 
6.5%
s 10407
 
6.2%
r 9544
 
5.7%
t 8956
 
5.3%
Other values (39) 42028
24.9%
Common
ValueCountFrequency (%)
. 9175
35.2%
@ 8392
32.2%
6209
23.8%
- 337
 
1.3%
1 300
 
1.2%
0 297
 
1.1%
2 274
 
1.1%
3 168
 
0.6%
5 141
 
0.5%
_ 138
 
0.5%
Other values (14) 606
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 194526
> 99.9%
None 5
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 17531
 
9.0%
a 17022
 
8.8%
c 14678
 
7.5%
e 13094
 
6.7%
i 12936
 
6.7%
n 11347
 
5.8%
m 10951
 
5.6%
s 10407
 
5.3%
r 9544
 
4.9%
. 9175
 
4.7%
Other values (62) 67841
34.9%
None
ValueCountFrequency (%)
· 5
100.0%

WebAddress
Categorical

Distinct8810
Distinct (%)60.3%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
4523 
www.dpcdsb.org
 
42
www.subway.com
 
37
www.timhortons.com
 
34
www.petro-canada.ca
 
22
Other values (8805)
9944 

Length

Max length50
Median length42
Mean length13.796466
Min length1

Characters and Unicode

Total characters201456
Distinct characters74
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8240 ?
Unique (%)56.4%

Sample

1st rowwww.golftrendsinc.com
2nd rowwww.apexgraphics.com
3rd row
4th rowwww.printmedia.ca
5th rowwww.swrltd.com

Common Values

ValueCountFrequency (%)
4523
31.0%
www.dpcdsb.org 42
 
0.3%
www.subway.com 37
 
0.3%
www.timhortons.com 34
 
0.2%
www.petro-canada.ca 22
 
0.2%
www.mississauga.ca/portal/residents/fire 19
 
0.1%
www.dollarama.com 18
 
0.1%
www.shoppersdrugmart.ca 18
 
0.1%
www.edwardjones.com 16
 
0.1%
www.pizzapizza.ca 15
 
0.1%
Other values (8800) 9858
67.5%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
www.dpcdsb.org 42
 
0.4%
www.subway.com 37
 
0.4%
www.timhortons.com 34
 
0.3%
www.petro-canada.ca 22
 
0.2%
www.mississauga.ca/portal/residents/fire 19
 
0.2%
www.dollarama.com 18
 
0.2%
www.shoppersdrugmart.ca 18
 
0.2%
www.edwardjones.com 16
 
0.2%
www.pizzapizza.ca 15
 
0.1%
www.td.com 15
 
0.1%
Other values (8790) 9853
97.7%

Most occurring characters

ValueCountFrequency (%)
w 31665
15.7%
. 20406
 
10.1%
c 15942
 
7.9%
a 15303
 
7.6%
o 14479
 
7.2%
e 11663
 
5.8%
m 10005
 
5.0%
s 8916
 
4.4%
i 8895
 
4.4%
r 8816
 
4.4%
Other values (64) 55366
27.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 175114
86.9%
Other Punctuation 20685
 
10.3%
Space Separator 4532
 
2.2%
Dash Punctuation 474
 
0.2%
Decimal Number 428
 
0.2%
Uppercase Letter 206
 
0.1%
Math Symbol 12
 
< 0.1%
Connector Punctuation 2
 
< 0.1%
Control 2
 
< 0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
w 31665
18.1%
c 15942
 
9.1%
a 15303
 
8.7%
o 14479
 
8.3%
e 11663
 
6.7%
m 10005
 
5.7%
s 8916
 
5.1%
i 8895
 
5.1%
r 8816
 
5.0%
t 8342
 
4.8%
Other values (17) 41088
23.5%
Uppercase Letter
ValueCountFrequency (%)
C 20
 
9.7%
S 17
 
8.3%
W 16
 
7.8%
M 14
 
6.8%
A 13
 
6.3%
T 11
 
5.3%
B 11
 
5.3%
L 11
 
5.3%
F 10
 
4.9%
R 10
 
4.9%
Other values (15) 73
35.4%
Decimal Number
ValueCountFrequency (%)
1 97
22.7%
2 90
21.0%
0 61
14.3%
4 59
13.8%
3 43
10.0%
6 21
 
4.9%
8 18
 
4.2%
9 18
 
4.2%
5 12
 
2.8%
7 9
 
2.1%
Other Punctuation
ValueCountFrequency (%)
. 20406
98.7%
/ 264
 
1.3%
@ 11
 
0.1%
& 2
 
< 0.1%
· 1
 
< 0.1%
, 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4532
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 474
100.0%
Math Symbol
ValueCountFrequency (%)
~ 12
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%
Control
ValueCountFrequency (%)
2
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 175320
87.0%
Common 26136
 
13.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
w 31665
18.1%
c 15942
 
9.1%
a 15303
 
8.7%
o 14479
 
8.3%
e 11663
 
6.7%
m 10005
 
5.7%
s 8916
 
5.1%
i 8895
 
5.1%
r 8816
 
5.0%
t 8342
 
4.8%
Other values (42) 41294
23.6%
Common
ValueCountFrequency (%)
. 20406
78.1%
4532
 
17.3%
- 474
 
1.8%
/ 264
 
1.0%
1 97
 
0.4%
2 90
 
0.3%
0 61
 
0.2%
4 59
 
0.2%
3 43
 
0.2%
6 21
 
0.1%
Other values (12) 89
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 201452
> 99.9%
None 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
w 31665
15.7%
. 20406
 
10.1%
c 15942
 
7.9%
a 15303
 
7.6%
o 14479
 
7.2%
e 11663
 
5.8%
m 10005
 
5.0%
s 8916
 
4.4%
i 8895
 
4.4%
r 8816
 
4.4%
Other values (62) 55362
27.5%
None
ValueCountFrequency (%)
é 3
75.0%
· 1
 
25.0%

EmplRange
Categorical

Distinct9
Distinct (%)0.1%
Missing567
Missing (%)3.9%
Memory size114.2 KiB
1 to 4
7005 
5 to 9
2824 
10 to 19
1835 
20 to 49
1403 
50 to 99
 
536
Other values (4)
 
432

Length

Max length10
Median length6
Mean length6.6555753
Min length5

Characters and Unicode

Total characters93411
Distinct characters11
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row10 to 19
2nd row20 to 49
3rd row50 to 99
4th row1 to 4
5th row5 to 9

Common Values

ValueCountFrequency (%)
1 to 4 7005
48.0%
5 to 9 2824
19.3%
10 to 19 1835
 
12.6%
20 to 49 1403
 
9.6%
50 to 99 536
 
3.7%
100 to 299 347
 
2.4%
300 to 499 46
 
0.3%
500 to 999 24
 
0.2%
1000+ 15
 
0.1%
(Missing) 567
 
3.9%

Length

Histogram of lengths of the category

Common Values (Plot)

ValueCountFrequency (%)
to 14020
33.3%
1 7005
16.6%
4 7005
16.6%
5 2824
 
6.7%
9 2824
 
6.7%
10 1835
 
4.4%
19 1835
 
4.4%
20 1403
 
3.3%
49 1403
 
3.3%
99 536
 
1.3%
Other values (8) 1385
 
3.3%

Most occurring characters

ValueCountFrequency (%)
28040
30.0%
t 14020
15.0%
o 14020
15.0%
1 11037
 
11.8%
4 8454
 
9.1%
9 7992
 
8.6%
0 4653
 
5.0%
5 3384
 
3.6%
2 1750
 
1.9%
3 46
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 37316
39.9%
Space Separator 28040
30.0%
Lowercase Letter 28040
30.0%
Math Symbol 15
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 11037
29.6%
4 8454
22.7%
9 7992
21.4%
0 4653
12.5%
5 3384
 
9.1%
2 1750
 
4.7%
3 46
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
t 14020
50.0%
o 14020
50.0%
Space Separator
ValueCountFrequency (%)
28040
100.0%
Math Symbol
ValueCountFrequency (%)
+ 15
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 65371
70.0%
Latin 28040
30.0%

Most frequent character per script

Common
ValueCountFrequency (%)
28040
42.9%
1 11037
 
16.9%
4 8454
 
12.9%
9 7992
 
12.2%
0 4653
 
7.1%
5 3384
 
5.2%
2 1750
 
2.7%
3 46
 
0.1%
+ 15
 
< 0.1%
Latin
ValueCountFrequency (%)
t 14020
50.0%
o 14020
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 93411
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
28040
30.0%
t 14020
15.0%
o 14020
15.0%
1 11037
 
11.8%
4 8454
 
9.1%
9 7992
 
8.6%
0 4653
 
5.0%
5 3384
 
3.6%
2 1750
 
1.9%
3 46
 
< 0.1%

EmplUpdate
Categorical

Distinct299
Distinct (%)2.1%
Missing91
Missing (%)0.6%
Memory size114.2 KiB
2016/10/31 00:00:00+00
4497 
2015/10/31 00:00:00+00
4060 
2016/08/12 00:00:00+00
 
271
2016/08/09 00:00:00+00
 
256
2016/08/17 00:00:00+00
 
254
Other values (294)
5173 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters319242
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique80 ?
Unique (%)0.6%

Sample

1st row2015/10/31 00:00:00+00
2nd row2016/10/31 00:00:00+00
3rd row2015/10/31 00:00:00+00
4th row2015/10/31 00:00:00+00
5th row2015/10/31 00:00:00+00

Common Values

ValueCountFrequency (%)
2016/10/31 00:00:00+00 4497
30.8%
2015/10/31 00:00:00+00 4060
27.8%
2016/08/12 00:00:00+00 271
 
1.9%
2016/08/09 00:00:00+00 256
 
1.8%
2016/08/17 00:00:00+00 254
 
1.7%
2016/08/16 00:00:00+00 245
 
1.7%
2016/07/22 00:00:00+00 236
 
1.6%
2016/07/29 00:00:00+00 233
 
1.6%
2016/08/19 00:00:00+00 231
 
1.6%
2016/08/11 00:00:00+00 187
 
1.3%
Other values (289) 4041
27.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
00:00:00+00 14511
50.0%
2016/10/31 4497
 
15.5%
2015/10/31 4060
 
14.0%
2016/08/12 271
 
0.9%
2016/08/09 256
 
0.9%
2016/08/17 254
 
0.9%
2016/08/16 245
 
0.8%
2016/07/22 236
 
0.8%
2016/07/29 233
 
0.8%
2016/08/19 231
 
0.8%
Other values (290) 4228
 
14.6%

Most occurring characters

ValueCountFrequency (%)
0 146800
46.0%
1 34876
 
10.9%
/ 29022
 
9.1%
: 29022
 
9.1%
2 17302
 
5.4%
14511
 
4.5%
+ 14511
 
4.5%
3 9486
 
3.0%
6 8953
 
2.8%
5 6174
 
1.9%
Other values (4) 8585
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 232176
72.7%
Other Punctuation 58044
 
18.2%
Space Separator 14511
 
4.5%
Math Symbol 14511
 
4.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 146800
63.2%
1 34876
 
15.0%
2 17302
 
7.5%
3 9486
 
4.1%
6 8953
 
3.9%
5 6174
 
2.7%
8 4356
 
1.9%
7 2116
 
0.9%
9 1309
 
0.6%
4 804
 
0.3%
Other Punctuation
ValueCountFrequency (%)
/ 29022
50.0%
: 29022
50.0%
Space Separator
ValueCountFrequency (%)
14511
100.0%
Math Symbol
ValueCountFrequency (%)
+ 14511
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 319242
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 146800
46.0%
1 34876
 
10.9%
/ 29022
 
9.1%
: 29022
 
9.1%
2 17302
 
5.4%
14511
 
4.5%
+ 14511
 
4.5%
3 9486
 
3.0%
6 8953
 
2.8%
5 6174
 
1.9%
Other values (4) 8585
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 319242
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 146800
46.0%
1 34876
 
10.9%
/ 29022
 
9.1%
: 29022
 
9.1%
2 17302
 
5.4%
14511
 
4.5%
+ 14511
 
4.5%
3 9486
 
3.0%
6 8953
 
2.8%
5 6174
 
1.9%
Other values (4) 8585
 
2.7%

Sector_Des
Categorical

Distinct29
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size114.2 KiB
12383 
Financial Services
 
870
Food and Beverage
 
444
Automotive
 
329
Life Sciences
 
263
Other values (24)
 
313

Length

Max length57
Median length1
Mean length3.3009177
Min length1

Characters and Unicode

Total characters48200
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
12383
84.8%
Financial Services 870
 
6.0%
Food and Beverage 444
 
3.0%
Automotive 329
 
2.3%
Life Sciences 263
 
1.8%
Aerospace 132
 
0.9%
Automotive,Aerospace 55
 
0.4%
Automotive,Food and Beverage 24
 
0.2%
Cleantech 24
 
0.2%
Automotive,Aerospace,Food and Beverage 15
 
0.1%
Other values (19) 63
 
0.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
services 884
19.8%
financial 870
19.5%
and 528
11.9%
beverage 514
11.5%
food 452
10.1%
automotive 329
 
7.4%
life 281
 
6.3%
sciences 265
 
5.9%
aerospace 132
 
3.0%
automotive,aerospace 55
 
1.2%
Other values (15) 145
 
3.3%

Most occurring characters

ValueCountFrequency (%)
14623
30.3%
e 5221
 
10.8%
i 3691
 
7.7%
a 3091
 
6.4%
c 2627
 
5.5%
n 2626
 
5.4%
o 2183
 
4.5%
v 1859
 
3.9%
r 1645
 
3.4%
s 1413
 
2.9%
Other values (16) 9221
19.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 29243
60.7%
Space Separator 14623
30.3%
Uppercase Letter 4130
 
8.6%
Other Punctuation 204
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5221
17.9%
i 3691
12.6%
a 3091
10.6%
c 2627
9.0%
n 2626
9.0%
o 2183
7.5%
v 1859
 
6.4%
r 1645
 
5.6%
s 1413
 
4.8%
d 1056
 
3.6%
Other values (8) 3831
13.1%
Uppercase Letter
ValueCountFrequency (%)
F 1412
34.2%
S 1180
28.6%
A 680
16.5%
B 528
 
12.8%
L 296
 
7.2%
C 34
 
0.8%
Space Separator
ValueCountFrequency (%)
14623
100.0%
Other Punctuation
ValueCountFrequency (%)
, 204
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33373
69.2%
Common 14827
30.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5221
15.6%
i 3691
11.1%
a 3091
9.3%
c 2627
 
7.9%
n 2626
 
7.9%
o 2183
 
6.5%
v 1859
 
5.6%
r 1645
 
4.9%
s 1413
 
4.2%
F 1412
 
4.2%
Other values (14) 7605
22.8%
Common
ValueCountFrequency (%)
14623
98.6%
, 204
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14623
30.3%
e 5221
 
10.8%
i 3691
 
7.7%
a 3091
 
6.4%
c 2627
 
5.5%
n 2626
 
5.4%
o 2183
 
4.5%
v 1859
 
3.9%
r 1645
 
3.4%
s 1413
 
2.9%
Other values (16) 9221
19.1%

CENT_X
Real number (ℝ)

Distinct4283
Distinct (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean608663.78
Minimum596627.93
Maximum616985.06
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum596627.93
5-th percentile601466.22
Q1606485.11
median608933.88
Q3611391.34
95-th percentile614838.89
Maximum616985.06
Range20357.121
Interquartile range (IQR)4906.2272

Descriptive statistics

Standard deviation3846.6868
Coefficient of variation (CV)0.0063198878
Kurtosis-0.065173007
Mean608663.78
Median Absolute Deviation (MAD)2453.003
Skewness-0.41309064
Sum8.8877085 × 109
Variance14796999
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
609556.5032 185
 
1.3%
612552.1674 123
 
0.8%
604009.418 113
 
0.8%
609657.7584 107
 
0.7%
615480.8966 91
 
0.6%
604848.575 56
 
0.4%
612581.1624 53
 
0.4%
600161.54 51
 
0.3%
608826.735 50
 
0.3%
604541.8476 50
 
0.3%
Other values (4273) 13723
94.0%
ValueCountFrequency (%)
596627.9342 1
 
< 0.1%
596752.9696 1
 
< 0.1%
597309.0542 1
 
< 0.1%
597312.632 1
 
< 0.1%
597772.3526 20
0.1%
597782.4012 1
 
< 0.1%
597812.404 1
 
< 0.1%
597933.2448 5
 
< 0.1%
597963.9396 12
0.1%
598104.1884 11
0.1%
ValueCountFrequency (%)
616985.0552 4
< 0.1%
616836.9092 1
 
< 0.1%
616794.193 1
 
< 0.1%
616756.05 1
 
< 0.1%
616706.7026 1
 
< 0.1%
616695.363 2
< 0.1%
616668.1574 1
 
< 0.1%
616646.1146 1
 
< 0.1%
616638.6966 1
 
< 0.1%
616626.2096 1
 
< 0.1%

CENT_Y
Real number (ℝ)

Distinct4283
Distinct (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4829572.8
Minimum4815609.1
Maximum4843107.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size114.2 KiB

Quantile statistics

Minimum4815609.1
5-th percentile4819704.2
Q14825906.1
median4829209.5
Q34833778.4
95-th percentile4839313
Maximum4843107.8
Range27498.788
Interquartile range (IQR)7872.3394

Descriptive statistics

Standard deviation5659.9228
Coefficient of variation (CV)0.0011719303
Kurtosis-0.59083802
Mean4829572.8
Median Absolute Deviation (MAD)3908.3803
Skewness0.013561312
Sum7.0521422 × 1010
Variance32034726
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4827620.949 185
 
1.3%
4837278.362 123
 
0.8%
4823628.592 113
 
0.8%
4841687.188 107
 
0.7%
4827728.859 91
 
0.6%
4824071.126 56
 
0.4%
4831178.774 53
 
0.4%
4826202.792 51
 
0.3%
4820599.19 50
 
0.3%
4823713.954 50
 
0.3%
Other values (4273) 13723
94.0%
ValueCountFrequency (%)
4815609.051 1
< 0.1%
4816109.607 1
< 0.1%
4816333.508 1
< 0.1%
4816381.801 2
< 0.1%
4816389.354 1
< 0.1%
4816663.969 1
< 0.1%
4816718.415 1
< 0.1%
4816798.73 1
< 0.1%
4816829.613 1
< 0.1%
4816881.173 2
< 0.1%
ValueCountFrequency (%)
4843107.84 10
0.1%
4843040.829 1
 
< 0.1%
4842998.68 1
 
< 0.1%
4842855.077 1
 
< 0.1%
4842717.945 1
 
< 0.1%
4842534.357 1
 
< 0.1%
4842303.169 2
 
< 0.1%
4842272.626 1
 
< 0.1%
4842238.75 1
 
< 0.1%
4842206.186 2
 
< 0.1%

Interactions

Correlations

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

XYFIDBusinessIDNameAddressStreetNoStreetNameBldgNoUnitNoPostalCodeLocationWardNAICSCodeNAICSDescrNAICSTitlePhoneFaxTollFreeEMailWebAddressEmplRangeEmplUpdateSector_DesCENT_XCENT_Y
0-79.68982943.64418111055Golf Trends Inc.300 Ambassador Dr300Ambassador DrL5T 2J3Gateway EA (East)5414470WholesaleAmusement and Sporting Goods Wholesaler-Distributors905-795-8900905-795-89881-800-668-1101lfinch@golftrendsinc.comwww.golftrendsinc.com10 to 192015/10/31 00:00:00+00605668.25384.833187e+06
1-79.68941943.64498821057Apex Graphics Inc.320 Ambassador Dr320Ambassador DrL5T 2J3Gateway EA (East)5323120ManufacturingSupport Activities for Printing905-795-9575905-795-8775prepress@apexgraphics.comwww.apexgraphics.com20 to 492016/10/31 00:00:00+00605699.93704.833277e+06
2-79.68941943.64498831058Sands, John & Associates Limited320 Ambassador Dr320Ambassador DrL5T 2J3Gateway EA (East)5323120ManufacturingSupport Activities for Printing905-795-9519905-795-877550 to 992015/10/31 00:00:00+00605699.93704.833277e+06
3-79.68941943.64498841060Printmedia-Tackaberry Times320 Ambassador Dr320Ambassador DrL5T 2J3Gateway EA (East)5323119ManufacturingOther Printing905-564-8121905-564-7395info@printmedia.cawww.printmedia.ca1 to 42015/10/31 00:00:00+00605699.93704.833277e+06
4-79.69066443.64549351061S W R Industries Ltd.321 Ambassador Dr321Ambassador DrL5T 2J3Gateway EA (East)5417230WholesaleIndustrial Machinery, Equipment and Supplies Wholesaler-Distributors905-564-8080905-564-5003shsieh@swrltd.comwww.swrltd.com5 to 92015/10/31 00:00:00+00605598.64424.833332e+06
5-79.69027743.64637261063Crossdock Freight Solutions361 Ambassador Dr361Ambassador DrL5T 2J3Gateway EA (East)5488519TransportationOther Freight Transportation Arrangement905-670-4937905-670-9475customerassist@crossdocksystems.comwww.crossdockfreight.com20 to 492015/10/31 00:00:00+00605628.28384.833430e+06
6-79.68987743.64691471065Green Belting Industries Ltd.381 Ambassador Dr381Ambassador DrL5T 2J3Gateway EA (East)5325510ManufacturingPaint and Coating Manufacturing905-564-6712905-564-67091-800-668-1114customerservice@greenbelting.comwww.greenbelting.com50 to 992016/10/31 00:00:00+00605659.56464.833490e+06
7-79.63427943.64040481073Dafco Filtration Group Corporation5390 Ambler Dr5390Ambler DrBL4W 1G9Northeast EA (West)5333413ManufacturingIndustrial and Commercial Fan and Blower and Air Purification Equipment Manufacturing905-602-1010905-629-1124info@dafcofiltrationgroup.comwww.dafco.ca50 to 992016/10/31 00:00:00+00610155.41824.832840e+06
8-79.63284443.64133791074Ace Trans Inc.5391 Ambler Dr5391Ambler Dr1L4W 1H1Northeast EA (West)5493110TransportationGeneral Warehousing and Storage905-625-3000905-625-6049info@acetrans.cawww.acetrans.ca1 to 42016/10/31 00:00:00+00610269.46404.832945e+06
9-79.63781543.642638101077Petro Maxx5510 Ambler Dr5510Ambler Dr1 to 2L4W 2V1Northeast EA (West)5541490ProfessionalOther Specialized Design Services905-206-0040blake@petromaxx.cawww.maxxgroupofcompanies.ca20 to 492015/10/31 00:00:00+00609866.14524.833083e+06
XYFIDBusinessIDNameAddressStreetNoStreetNameBldgNoUnitNoPostalCodeLocationWardNAICSCodeNAICSDescrNAICSTitlePhoneFaxTollFreeEMailWebAddressEmplRangeEmplUpdateSector_DesCENT_XCENT_Y
14592-79.64798443.6496421459386043Safe Driver Services Inc.1295 Shawson Dr1295Shawson Dr201L4W 1C4Northeast EA (West)5523120FinanceSecurities Brokerage905-565-6777905-564-0995echang159@gmail.comwww.safedriverservice.com50 to 992015/10/31 00:00:00+00609033.27404.833848e+06
14593-79.64121143.6008181459486289Leadbox55 Village Centre Pl55Village Centre Pl106, 307, 309L4Z 1V9DT Core4541514ProfessionalComputer systems design and related services (except video game design and\nComputer systems design and related services (except video game design and development)905-361-1188info@leadboxhq.comwww.leadboxhq.com5 to 92015/10/31 00:00:00+00Automotive609668.25724.828434e+06
14594-79.64934843.6484351459586045Pedco Supply Inc.1235 Shawson Dr1235Shawson Dr1&3L4W 1C4Northeast EA (West)5416120WholesalePlumbing, Heating and Air-Conditioning Equipment and Supplies Wholesaler-Distributors905-696-87001 to 42015/10/31 00:00:00+00608925.44404.833712e+06
14595-79.70562243.5590651459686290Credit Valley Imaging Associates2300 Eglinton Ave W2300Eglinton Ave WG02L5M 2V8Central Erin Mills MN8621510Health CareMedical and Diagnostic Laboratories905-828-0653905-828-0765manager@civa.cawww.cvia.ca20 to 492015/08/18 00:00:00+00604541.84764.823714e+06
14596-79.61368943.6325601459786046My Delish1550 South Gateway Rd1550South Gateway Rd6AL4W 5G6Northeast EA (West)3722512AccommodationLimited-service eating places905-270-98921 to 42016/10/31 00:00:00+00611830.70304.831996e+06
14597-79.70098543.5651621459886293Dr. Poon Costmetic1675 The Chase1675The Chase30L5M 5Y7Central Erin Mills NHD11621390Health CareOffices of All Other Health Practitioners905-820-1398905-607-6048jcpoonmd@gmail.comwww.drpooncosmetic.com1 to 42015/07/28 00:00:00+00604905.75044.824397e+06
14598-79.65538543.6493081459986049Superior Exotics6033 Shawson Dr6033Shawson Dr28L5T 1H8Northeast EA (West)5532111Real EstatePassenger Car Rental647-974-6919danield@superiorexotics.cawww.superiorexotics.ca1 to 42015/10/31 00:00:00+00608437.00724.833801e+06
14599-79.70098543.5651621460086294ProComp Systems1675 The Chase1675The Chase35L5M 5Y7Central Erin Mills NHD11443144RetailComputer and software stores905-828-6688sales@microcompsystems.cawww.microcompsystems.caNaNNaNAutomotive604905.75044.824397e+06
14600-79.62557943.6365531460186057Gotham Central1400 Aimco Blvd1400Aimco Blvd1L4W 1E1Northeast EA (West)5451120RetailHobby, Toy and Game Stores905-212-9992gothamcentral@rogers.com1 to 42016/10/31 00:00:00+00610864.24524.832424e+06
14601-79.62557943.6365531460286058S R S Automotive Tinting Solutions1400 Aimco Blvd1400Aimco Blvd7L4W 1E1Northeast EA (West)5811122Other ServicesAutomotive Glass Replacement Shops905-766-33671-888-496-3238srstints@gmail.comwww.srstints.ca1 to 42016/10/31 00:00:00+00610864.24524.832424e+06